{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 用于 TensorFlow 模型的 XLA 集成"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "加速线性代数，也称为XLA，是一个用于加速TensorFlow模型运行时间的编译器。从[官方文档](https://www.tensorflow.org/xla)中可以看到：\n",
    "\n",
    "XLA（加速线性代数）是一种针对线性代数的特定领域编译器，可以在可能不需要更改源代码的情况下加速TensorFlow模型。\n",
    "\n",
    "在TensorFlow中使用XLA非常简单——它包含在`tensorflow`库中，并且可以使用任何图创建函数中的`jit_compile`参数来触发，例如[`tf.function`](https://www.tensorflow.org/guide/intro_to_graphs)。在使用Keras方法如`fit()`和`predict()`时，只需将`jit_compile`参数传递给`model.compile()`即可启用XLA。然而，XLA不仅限于这些方法 - 它还可以用于加速任何任意的`tf.function`。\n",
    "\n",
    "在🤗 Transformers中，几个TensorFlow方法已经被重写为与XLA兼容，包括[GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)、[T5](https://huggingface.co/docs/transformers/model_doc/t5)和[OPT](https://huggingface.co/docs/transformers/model_doc/opt)等文本生成模型，以及[Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)等语音处理模型。\n",
    "\n",
    "虽然确切的加速倍数很大程度上取决于模型，但对于🤗 Transformers中的TensorFlow文本生成模型，我们注意到速度提高了约100倍。本文档将解释如何在这些模型上使用XLA获得最大的性能。如果您有兴趣了解更多关于基准测试和我们在XLA集成背后的设计哲学的信息，我们还将提供额外的资源链接。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 XLA 运行 TensorFlow 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们考虑以下TensorFlow 中的模型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "model = tf.keras.Sequential(\n",
    "    [tf.keras.layers.Dense(10, input_shape=(10,), activation=\"relu\"), tf.keras.layers.Dense(5, activation=\"softmax\")]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述模型接受维度为 `(10,)` 的输入。我们可以像下面这样使用模型进行前向传播："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate random inputs for the model.\n",
    "batch_size = 16\n",
    "input_vector_dim = 10\n",
    "random_inputs = tf.random.normal((batch_size, input_vector_dim))\n",
    "\n",
    "# Run a forward pass.\n",
    "_ = model(random_inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了使用 XLA 编译的函数运行前向传播，我们需要执行以下操作："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xla_fn = tf.function(model, jit_compile=True)\n",
    "_ = xla_fn(random_inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`model`的默认`call()`函数用于编译XLA图。但如果你想将其他模型函数编译成XLA，也是可以的，如下所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_xla_fn = tf.function(model.my_xla_fn, jit_compile=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在🤗 Transformers库中使用XLA运行TensorFlow文本生成模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "要在🤗 Transformers中启用XLA加速生成，您需要安装最新版本的`transformers`。您可以通过运行以下命令来安装它：\n",
    "\n",
    "```bash\n",
    "pip install transformers --upgrade\n",
    "```\n",
    "\n",
    "然后您可以运行以下代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from transformers import AutoTokenizer, TFAutoModelForCausalLM\n",
    "\n",
    "# Will error if the minimal version of Transformers is not installed.\n",
    "from transformers.utils import check_min_version\n",
    "\n",
    "check_min_version(\"4.21.0\")\n",
    "\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"openai-community/gpt2\", padding_side=\"left\", pad_token=\"</s>\")\n",
    "model = TFAutoModelForCausalLM.from_pretrained(\"openai-community/gpt2\")\n",
    "input_string = [\"TensorFlow is\"]\n",
    "\n",
    "# One line to create an XLA generation function\n",
    "xla_generate = tf.function(model.generate, jit_compile=True)\n",
    "\n",
    "tokenized_input = tokenizer(input_string, return_tensors=\"tf\")\n",
    "generated_tokens = xla_generate(**tokenized_input, num_beams=2)\n",
    "\n",
    "decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)\n",
    "print(f\"Generated -- {decoded_text}\")\n",
    "# Generated -- TensorFlow is an open-source, open-source, distributed-source application # framework for the"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正如您所注意到的，在`generate()`上启用XLA只需要一行代码。其余部分代码保持不变。然而，上面的代码片段中有一些与XLA相关的注意事项。您需要了解这些注意事项，以充分利用XLA可能带来的性能提升。我们将在下面的部分讨论这些内容。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 需要关注的注意事项"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当您首次执行启用XLA的函数（如上面的`xla_generate()`）时，它将在内部尝试推断计算图，这是一个耗时的过程。这个过程被称为[“tracing”](https://www.tensorflow.org/guide/intro_to_graphs#when_is_a_function_tracing)。\n",
    "\n",
    "您可能会注意到生成时间并不快。连续调用`xla_generate()`（或任何其他启用了XLA的函数）不需要再次推断计算图，只要函数的输入与最初构建计算图时的形状相匹配。对于具有固定输入形状的模态（例如图像），这不是问题，但如果您正在处理具有可变输入形状的模态（例如文本），则必须注意。\n",
    "\n",
    "为了确保`xla_generate()`始终使用相同的输入形状，您可以在调用`tokenizer`时指定`padding`参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from transformers import AutoTokenizer, TFAutoModelForCausalLM\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"openai-community/gpt2\", padding_side=\"left\", pad_token=\"</s>\")\n",
    "model = TFAutoModelForCausalLM.from_pretrained(\"openai-community/gpt2\")\n",
    "input_string = [\"TensorFlow is\"]\n",
    "\n",
    "xla_generate = tf.function(model.generate, jit_compile=True)\n",
    "\n",
    "# Here, we call the tokenizer with padding options.\n",
    "tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors=\"tf\")\n",
    "\n",
    "generated_tokens = xla_generate(**tokenized_input, num_beams=2)\n",
    "decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)\n",
    "print(f\"Generated -- {decoded_text}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过这种方式，您可以确保`xla_generate()`的输入始终具有它跟踪的形状，从而加速生成时间。您可以使用以下代码来验证这一点："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import tensorflow as tf\n",
    "from transformers import AutoTokenizer, TFAutoModelForCausalLM\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"openai-community/gpt2\", padding_side=\"left\", pad_token=\"</s>\")\n",
    "model = TFAutoModelForCausalLM.from_pretrained(\"openai-community/gpt2\")\n",
    "\n",
    "xla_generate = tf.function(model.generate, jit_compile=True)\n",
    "\n",
    "for input_string in [\"TensorFlow is\", \"TensorFlow is a\", \"TFLite is a\"]:\n",
    "    tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors=\"tf\")\n",
    "    start = time.time_ns()\n",
    "    generated_tokens = xla_generate(**tokenized_input, num_beams=2)\n",
    "    end = time.time_ns()\n",
    "    print(f\"Execution time -- {(end - start) / 1e6:.1f} ms\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在Tesla T4 GPU上，您可以期望如下的输出：\n",
    "\n",
    "```bash\n",
    "Execution time -- 30819.6 ms\n",
    "\n",
    "Execution time -- 79.0 ms\n",
    "\n",
    "Execution time -- 78.9 ms\n",
    "```\n",
    "\n",
    "第一次调用`xla_generate()`会因为`tracing`而耗时，但后续的调用会快得多。请注意，任何时候对生成选项的更改都会触发重新`tracing`，从而导致生成时间减慢。\n",
    "\n",
    "在本文档中，我们没有涵盖🤗 Transformers提供的所有文本生成选项。我们鼓励您阅读文档以了解高级用例。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 附加资源"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以下是一些附加资源，如果您想深入了解在🤗 Transformers和其他库下使用XLA：\n",
    "\n",
    "* [这个Colab Notebook](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/91_tf_xla_generate.ipynb) 提供了一个互动演示，让您可以尝试使用XLA兼容的编码器-解码器（例如[T5](https://huggingface.co/docs/transformers/model_doc/t5)）和仅解码器（例如[GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)）文本生成模型。\n",
    "\n",
    "* [这篇博客文章](https://huggingface.co/blog/tf-xla-generate) 提供了XLA兼容模型的比较基准概述，以及关于在TensorFlow中使用XLA的友好介绍。\n",
    "\n",
    "* [这篇博客文章](https://blog.tensorflow.org/2022/11/how-hugging-face-improved-text-generation-performance-with-xla.html) 讨论了我们在🤗 Transformers中为TensorFlow模型添加XLA支持的设计理念。\n",
    "\n",
    "* 推荐用于更多学习XLA和TensorFlow图的资源：\n",
    "    * [XLA：面向机器学习的优化编译器](https://www.tensorflow.org/xla)\n",
    "    * [图和tf.function简介](https://www.tensorflow.org/guide/intro_to_graphs)\n",
    "    * [使用tf.function获得更好的性能](https://www.tensorflow.org/guide/function)"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
